This project focused on discovering the common variables that high performance Airbnb service in the Berlin area share. It is achieved by comparing the information like price and popularity among all the Airbnb houses in that area. And we hope with the help of our results, our readers are able to find their favorite type of houses.
Airbnb has become a popular choice for people who enjoy traveling. However, Airbnb still lacks some common standards to evaluate the house service. As a result, it is difficult for travelers to pick favorable Airbnb houses. This project is the first step toward providing Airbnb investors and traveling customers with a data-driven perspective on the Airbnb lodging market.
Data visualizations in the project give customers and investors a holistic viewpoint of Airbnb’s estate location, popularity, and price trends based on demand. Furthermore, we highlight a Berlin real-estate investor who seeks to understand the monthly availability of estates within our practical use scenario. In addition, the investor is interested in understanding the price trends of Airbnb communities across Berlin.
The data is sourced from the Kaggle website https://www.kaggle.com/datasets/brittabettendorf/berlin-airbnb-data which hosts publicly available data from the Airbnb site.
We will be visualizing this dataset, including 22,552 Airbnb listings. This ideal dataset fulfills the requirement of both qualitative and quantitative data(quantitative:price/qualitative:room type), a time element(calendar date of airbnb availability),a geospatial element(location of airbnb), and a text element(review of airbnb). In addition, we would like to predict the marketable price for each Berlin neighborhood using listing descriptions and identify the busiest times of the year to visit Berlin.
We have six datasets in total(calendar_summary, listings, listings_summary, neighborhoods, reviews, reviews_summary), and plan to be cleaned by merging each Airbnb listing id in general. Meanwhile, due to the location of Berlin, the neighborhood name is in German, so we need to translate it for the audience’s convenience. We would like to use Python as the primary data cleaning tool for data munging.
Britta is a new real-estate investor in Berlin,Europe. She wants to understand what factors lead to a highly marketable Airbnb estate as she seeks to invest in properties that will strategically attract Airbnb tenants. She wants to explore a dataset to predict the price for each Berlin neighborhood using listing descriptions and identify the busiest times of the year to visit Berlin. When Britta logs on to the “Airbnb Trends app”, she will see an overview of all the available variables in her dataset, including the estate’s listing description, price, and count of tenants per month. She can filter out variables for listing descriptions and neighborhood location to discover the average estate price per neighborhood.
Real-estate Investor Britta
The user can use the zoom-in operation to see the detailed information of each cluster while zooming out to check the overall distribution of Airbnb houses in the Berlin area. The user can also click on each cluster to see the unique information of each Airbnb house at each point. For example, the orange cluster represents the number of Airbnb houses within it is high; the yellow color represents the median number within it. In contrast, the green color represents the number within it is low.
Figure 1 is a unique visualization for our project. It provides a cluster plot that records the distribution of the Airbnb service in the Berlin area. Each point on our visualization represents the location of one Airbnb house. These points are clustered into different groups, and the size of each group is classified based on its color. By viewing this figure, the reader can check the location of the preferred Airbnb houses and access the unique information like room type in each estate.
The color of each data point reflects the price information of each Airbnb house in the Berlin area. For example, the dark red color is considered houses with the highest price, while the dark blue color represents the houses with the lowest price. The colors in between represent houses with relatively median prices. Users can also use the zoom-in operation to see detailed information like the location of points in each color region, and they are also able to check the landmarks around each data point.
Figure 2 is a meaningful visualization of our project. It conveys that the color difference will be a good representation of a change in prices. And the plot also records the coordinates of each house and the landmarks around them. So by viewing figure 2, the reader could have a brief impression of the price situation in their selected traveling region and pick their favorable house.
The created interactive web graphs from ggplot2 visualizations with the ggiraph R package. The tooltip hovering over one DataViz affects the display of another. This tooltip adds value to the plot, highlighting and comparing the counts of estates in two categories for the same neighborhood.
Figure 3 is a linked view visualization for our project. It provides a linked highlight plot that compares the counts of estates in two categories(high-level price>200/low-level price<=200) for the same neighborhood. Each bar on our visualization represents a neighborhood district in Berlin. And those neighborhoods are ordered in sequence from the most to the least. By viewing this figure, the reader can understand how many high-priced estates are in each specific neighborhood, the same for low-price estates. This understanding would make the choices of Airbnb more insightful.
Within the visualization below, users can view a trend of Airbnb estate availability. Here the user can click and hover over the trend line to view the fluctuation in the availability of estates.
The “Availability of Airbnb Estates” trend plot allows our users to view the demand for Airbnb reservations over a calendar year. Users can identify a correlation between the high demand for Airbnbs and holidays. For example, New Year’s Eve/Day are both days with the least amount of availability. Furthermore, availability consistently decreases as the summer months approach. Therefore, one can conclude that both holidays and the summer season attract the most customers for Airbnb.
The scatter points reflect the number of listing reviews recorded from the year 2010 to the year 2018. And the line on the scatter point indicates the change in the average number of reviews across each year.User can discover the positive trend in popularity over time.
Figure 5 shows the change in popularity of the Airbnb service in the Berlin area from 2010 to 2018. Based on the plot, there is a significant increase in the number of reviews each year which indicates that the number of people giving comments on the Airbnb services also increases each year. And we believe that is a reflection of increased popularity.
The opportunity to develop visualizations regarding Airbnb’s estate’s location, popularity, and price trends in Berlin has led us to discuss the relevance and practical implication of the project’s results. We first discuss the project’s innovative plot (fig. 1). For the innovative view plot, the user can discover the cluster distribution of the Airbnb houses in Berlin on the map plot. As a result, the user can observe that most Airbnb houses are located very close to each other, while the location of each group is spread out over the entire city. This discovery correlates to the Airbnb services’ development pattern; unlike traditional hotels, Airbnb houses are established by individuals abroad. The individual ownership and establishment of Airbnbs have led to the convenience and accessibility of Airbnb rentals across Berlin. Therefore, users may analyze the innovative plot and notice that the points are clustered into different groups all over the city map. The only difference is the size of each cluster. The user can conclude that Airbnb estates share similar amenities and local environments such as malls, restaurants, and outdoor recreation. Hence, Airbnb’s location is vital to the marketability of the estate. The insights from the innovation plot best serve users as a guide to investigating preferred estates based on proximity to other Airbnb properties.
Next, we discuss the coordinate plot(fig. 2). For the coordinate plot, the user can review the price situation of the Airbnb houses in the Berlin area through the color notations on the plot. As stated in the introduction section, one of the project’s goals is to discover the price distribution of Airbnb houses. And based on the observation, the user can find that the price of the estates will be higher if they are located in the center region of the city or if there are more landmarks around them than others. This discovery is meaningful since it describes the price pattern of the Airbnb service in the Berlin area and reveals the balance between price and popularity. This finding is also helpful for readers aiming to find favorable Airbnb houses, which will reduce their anxiety when considering price variables.
Furthermore, we discuss the linked view plot(fig. 3); this will give real estate investors a luxury and affordable choice by simply selecting neighborhoods in the panel. The two parallel plots give users direct observation of the distribution and ranking of two categories(luxury/affordable) in the same neighborhood. Real estate investors can conveniently determine where they will establish their Airbnb business by hovering over the graph. For example, if investors are interested in one particular neighborhood to get properties gathering together, this linked plot would be an entry point to the comprehensive decision after comparison.
Next, the line graph(fig. 4), the user can observe the change in availability of the Airbnb houses in the Berlin area in the selected year. Based on the plot, the user can realize that the availability will experience a sudden drop corresponding to the national holidays in the year. Besides, the summer availability is also considered relatively low compared to other regular periods. The discovery is meaningful since it depicts the available situation of Airbnb houses in the Berlin area and can be used as a vital reference when predicting the availability of estates.
Finally, we talk about the scatter plot(fig. 5). For the scatter plot, the user can identify the popularity trend of Airbnb services. The user can observe the number of reviews throughout each year based on the plot. An increase in reviews is a good indicator of the increased reputation of the Airbnb service. Thus, the reader can conclude a meaningful finding that more and more visitors prefer to use Airbnb houses instead of traditional hotels. This finding can serve as a strong encouragement for investors interested in participating in Airbnb-related business.
After having the opportunity to complete this project, we are confident that our readers will have much more comprehensive knowledge about the Airbnb service in the Berlin area. In addition, information recorded by our visualizations can offer good suggestions for both travelers and investors. Besides, those visualizations have an excellent performance in explaining the questions we proposed at the beginning of the project. Thus, based on the results, we can conclude that Airbnb houses have become an increasingly popular option for travelers. The number of people preferring the Airbnb service has increased each year. And those Airbnb houses are distributed all over Berlin city. The price of the Airbnb houses also has a clear pattern. Thus, the prices near the city’s center are the highest on average. Overall, the project is meaningful as it explores the pattern of the Airbnb service in the Berlin area from a data science perspective. And those patterns like pricing could be beneficial for readers planning to travel to the Berlin area.
In future work, we propose including more forecasting tools to help real estate investors understand how to increase revenue from Airbnb estates. For example, we have discovered that the availability of the Airbnb service will be relatively low during the holiday period. Therefore, to avoid unnecessary expenses during those holiday periods, we need to find out dates that are not booked unusually. In addition, researchers might investigate the price fluctuation during these holiday seasons in a future study to discover the trade-off between price and availability.